Genetic Programming for Document Segmentation and Region Classification Using Discipulus

نویسندگان

  • N. Priyadharshini
  • M. S. Vijaya
چکیده

Document segmentation is a method of rending the document into distinct regions. A document is an assortment of information and a standard mode of conveying information to others. Pursuance of data from documents involves ton of human effort, time intense and might severely prohibit the usage of data systems. So, automatic information pursuance from the document has become a big issue. It is been shown that document segmentation will facilitate to beat such problems. This paper proposes a new approach to segment and classify the document regions as text, image, drawings and table. Document image is divided into blocks using Run length smearing rule and features are extracted from every blocks. Discipulus tool has been used to construct the Genetic programming based classifier model and located 97.5% classification accuracy. Keywords—Document analysis; Information retrieval; Classification; Feature extraction; Document segmentation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Genetic Programming based DNA Microarray Analysis for Classification of Cancer

In this study the advantages of statistical gene selection are combined with the power of Genetic Programming (GP) to build classifiers for assigning gene expression microarray data samples to categories characteristic of certain cell states. To that end we implemented different statistical measures in a program called GENEACTIVATOR and tested their applicability to gene selection. Subsequently...

متن کامل

GENETIC PROGRAMMING BASED DNA MICROARRAY ANALYSIS FOR CLASSIFICATION OF CANCER by

In this study the advantages of statistical gene selection are combined with the power of Genetic Programming (GP) to build classifiers for assigning gene expression microarray data samples to categories characteristic of certain cell states. To that end we implemented different statistical measures in a program called GENEACTIVATOR and tested their applicability to gene selection. Subsequently...

متن کامل

Testing Discipulus Linear Genetic Programming Software on Real-world Environmental Engineering Challenges

Genetic Programming (GP) is a machine learning technique that writes computer programs, automatically. Although individual researchers used GP techniques in the 1960’s and 1970’s, GP emerged as a distinct discipline in 1992. Since that time, over one thousand academic studies have been published in the field and, in 1998, commercial GP software – Discipulus – reached the market. Discipulus is a...

متن کامل

Comparison of DiscipulusTM Linear Genetic Programming Soft- ware with Support Vector Machines, Classification Trees, Neural Networks and Human Experts

DiscipulusTM is multiple-run, linear, genetic-programming software. Various versions have been available commercially since 1998 (see, www.aimlearning.com). Discipulus creates models directly from data, like neural networks or support vector machines. This white paper reports on the result of a multi-year study of the performance of Discipulus by Science Applications International Corp (SAIC) a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1303.0460  شماره 

صفحات  -

تاریخ انتشار 2013